{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convolution with Maxpool Demo Run\n",
    "___\n",
    "This notebook shows a single run of the convolution and maxpool using Darius IP. The input feature map is read from memory, processed and output feature map is captured for one single convolution with maxpool command. The cycle count and efficiency for the full operation is read and displayed at the end.   \n",
    "The input data in memory is set with random integers in this notebook to test the run."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Terminology\n",
    "| Term    | Description                              |\n",
    "| :------ | :--------------------------------------- |\n",
    "| IFM     | Input volume                             |\n",
    "| Weights | A set of filter volumes                  |\n",
    "| OFM     | Output volume                            |\n",
    "\n",
    "#### Arguments\n",
    "| Convolution Arguments | Description                              |\n",
    "| --------------------- | ---------------------------------------- |\n",
    "| ifm-h, ifm-w          | Height and width of an input feature map in an IFM volume |\n",
    "| ifm-d                 | Depth of the IFM volume                  |\n",
    "| kernel-h, kernel-w    | Height and width of the weight filters   |\n",
    "| stride                | Stride for the IFM volume                |\n",
    "| pad                   | Pad for the IFM volume                   |\n",
    "| Channels              | Number of Weight sets/number of output feature maps |\n",
    "| Pool-kernel-h, Pool-kernel-w | Height and width of maxpool kernel |\n",
    "| Pool-stride           | Stride for the Convolved volume           |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Block diagram"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](./images/darius_bd.png)\n",
    "<center>Figure 1</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Figure 1 presents a simplified block diagram including Darius CNN IP that is used for running convolution tasks. The Processing System (PS) represents the ARM processor, as well as the external DDR. The Programmable Logic (PL) incorporates the Darius IP for running convolution tasks, and an AXI Interconnect IP. The AXI_GP_0 is an AXILite interface for control signal communication between the ARM and the Darius IP. The data transfer happens through the AXI High Performance Bus, denoted as AXI_HP_0. __For more information about the Zynq architecture, visit:__ [Link](https://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dataflow   \n",
    "\n",
    "The dataflow begins by creating an input volume, and a set of weights in python local memory. In Figure 1, these volumes are denoted as “ifm_sw”, and “weights_sw”, respectively. After populating random data, the ARM processor reshapes and copies these volumes into contiguous blocks of shared memory, represented as “ifm” and “weights” in Figure 1, using the “reshape_and_copy()” function. Once the data is accessible by the hardware, the PS starts the convolution operation by asserting the start bit of Darius IP, through the “AXI_GP_0” interface. Darius starts the processing by reading the “ifm” and “weights” volumes from the external memory and writing the results back to a pre-allocated location, shown as “ofm” in Figure 1.\n",
    "Notes:\n",
    "-\tWe presume the data in the “ifm_sw” and “weight_sw”, are populated in a row-major format. In order to get the correct results, these volumes have to be reshaped into an interleaved format, as expected by the Darius IP. \n",
    "-\tNo data reformatting is required for subsequent convolution calls to Darius, as it produces the “ofm” volume in the same format as it expects the “ifm” volume.\n",
    "-\tSince the shared memory region is accessible both by the PS and PL regions, one can perform any post-processing steps that may be required directly on the “ofm” volume without transferring data back-and-forth to the python local memory. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 1: Set the arguments for the convolution in CNNDataflow IP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HOST CMD: CNNDataflow IP Arguments set are - IH 14, IW 14, ID 64, KH 3, KW 3, P 0, S 1, CH 32, PKH 2, PKW 2, PS 2\n"
     ]
    }
   ],
   "source": [
    "# Input Feature Map (IFM) dimensions\n",
    "ifm_height = 14\n",
    "ifm_width = 14\n",
    "ifm_depth = 64\n",
    "\n",
    "# Kernel Window dimensions\n",
    "kernel_height = 3\n",
    "kernel_width = 3\n",
    "\n",
    "# Other arguments\n",
    "pad = 0\n",
    "stride = 1\n",
    "\n",
    "# Channels\n",
    "channels = 32\n",
    "\n",
    "# Maxpool dimensions\n",
    "pool_kernel_height = 2\n",
    "pool_kernel_width = 2\n",
    "pool_stride = 2\n",
    "\n",
    "print(\n",
    "    \"HOST CMD: CNNDataflow IP Arguments set are - IH %d, IW %d, ID %d, KH %d,\"\n",
    "    \" KW %d, P %d, S %d, CH %d, PKH %d, PKW %d, PS %d\"\n",
    "    % (ifm_height, ifm_width, ifm_depth, kernel_height, kernel_width,\n",
    "       pad, stride, channels, pool_kernel_height, pool_kernel_width, pool_stride))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Download `Darius Convolution IP` bitstream"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bitstream download status: True\n"
     ]
    }
   ],
   "source": [
    "from pynq import Overlay\n",
    "\n",
    "overlay = Overlay(\n",
    "    \"/opt/python3.6/lib/python3.6/site-packages/pynq/overlays/darius/\"\n",
    "    \"convolution.bit\")\n",
    "overlay.download()\n",
    "print(f'Bitstream download status: {overlay.is_loaded()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3:  Create MMIO object to access the CNNDataflow IP\n",
    "For more on MMIO visit: [MMIO Documentation](http://pynq.readthedocs.io/en/latest/overlay_design_methodology/pspl_interface.html#mmio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Idle state: 0x4\n"
     ]
    }
   ],
   "source": [
    "from pynq import MMIO\n",
    "\n",
    "# Constants\n",
    "CNNDATAFLOW_BASEADDR = 0x43C00000\n",
    "NUM_COMMANDS_OFFSET = 0x60\n",
    "CMD_BASEADDR_OFFSET = 0x70\n",
    "CYCLE_COUNT_OFFSET = 0xd0\n",
    "\n",
    "cnn = MMIO(CNNDATAFLOW_BASEADDR, 65536)\n",
    "print(f'Idle state: {hex(cnn.read(0x0, 4))}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4: Create Xlnk object \n",
    "Xlnk object (Memory Management Unit) for allocating contiguous array in memory for data transfer between software and hardware\n",
    "\n",
    "<div class=\"alert alert-danger\">Note: You may run into problems if you exhaust and do not free memory buffers – we only have 128MB of contiguous memory, so calling the allocation twice (allocating 160MB) would lead to a “failed to allocate memory” error. Do a xlnk_reset() before re-allocating memory or running this cell twice  </div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from pynq import Xlnk\n",
    "import numpy as np\n",
    "\n",
    "# Constant\n",
    "SIZE = 5000000  # 20 MB of numpy.uint32s\n",
    "\n",
    "mmu = Xlnk()\n",
    "\n",
    "# Contiguous memory buffers for CNNDataflow IP convolution command, IFM Volume,\n",
    "# Weights and OFM Volume. These buffers are shared memories that are used to \n",
    "# transfer data between software and hardware\n",
    "cmd = mmu.cma_array(SIZE, dtype=np.int16)\n",
    "ifm = mmu.cma_array(SIZE, dtype=np.int16)\n",
    "weights = mmu.cma_array(SIZE, dtype=np.int16)\n",
    "ofm = mmu.cma_array(SIZE, dtype=np.int16)\n",
    "\n",
    "# Saving the base phyiscal address for the command, ifm, weights, and\n",
    "# ofm buffers. These addresses will be used later to copy and transfer data \n",
    "# between hardware and software\n",
    "cmd_baseaddr = cmd.physical_address\n",
    "ifm_baseaddr = ifm.physical_address\n",
    "weights_baseaddr = weights.physical_address\n",
    "ofm_baseaddr = ofm.physical_address"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 5: Functions to print Xlnk statistics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Available Memory (KB): 91136\n",
      "Available Buffers: 4\n"
     ]
    }
   ],
   "source": [
    "def get_kb(mmu):\n",
    "    return int(mmu.cma_stats()['CMA Memory Available'] // 1024)\n",
    "\n",
    "\n",
    "def get_bufcount(mmu):\n",
    "    return int(mmu.cma_stats()['Buffer Count'])\n",
    "\n",
    "\n",
    "def print_kb(mmu):\n",
    "    print(\"Available Memory (KB): \" + str(get_kb(mmu)))\n",
    "    print(\"Available Buffers: \" + str(get_bufcount(mmu)))\n",
    "\n",
    "\n",
    "print_kb(mmu)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 6: Construct convolution command\n",
    "Check that arguments are in supported range and construct convolution command for hardware"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All IP arguments are in supported range\n",
      "Command to CNNDataflow IP: \n",
      "b'\\x0e\\x00\\x0e\\x00\\x03\\x00\\x03\\x00\\x01\\x00\\x00\\x00\\x0c\\x00\\x0c\\x00\\x08\\x00\\x04\\x00\\x01\\x00\\x01\\x00\\x00\\x000\\x17 \\x06\\x00\\x00\\x001\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00p\\x18\\x90\\x00\\x00\\x00\\x00\\x00\\xd0\\x17@\\x02\\x00\\x00\\x00\\x12\\x00\\x00\\x00\\x00\\x00\\x00\\x0c\\x00\\x0c\\x00\\x02\\x00\\x02\\x00\\x06\\x00\\x06\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\n"
     ]
    }
   ],
   "source": [
    "from darius import darius_lib\n",
    "\n",
    "conv_maxpool   = darius_lib.Darius(ifm_height, ifm_width, ifm_depth,\n",
    "                                   kernel_height, kernel_width, pad, stride,\n",
    "                                   channels, pool_kernel_height, \n",
    "                                   pool_kernel_width, pool_stride,\n",
    "                                   ifm_baseaddr, weights_baseaddr,\n",
    "                                   ofm_baseaddr)\n",
    "\n",
    "IP_cmd = conv_maxpool.IP_cmd()\n",
    "\n",
    "print(\"Command to CNNDataflow IP: \\n\" + str(IP_cmd))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 7: Create IFM volume and weight volume.\n",
    "Volumes are created in software and populated with random values in a row-major format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from random import *\n",
    "\n",
    "ifm_sw = np.random.randint(0,255, ifm_width*ifm_height*ifm_depth, dtype=np.int16)\n",
    "weights_sw = np.random.randint(0,255, channels*ifm_depth*kernel_height*kernel_width, dtype=np.int16)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reshape IFM volume and weights\n",
    "Volumes are reshaped from row-major format to IP format and data is copied to their respective shared buffer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "conv_maxpool.reshape_and_copy_ifm(ifm_sw, ifm)\n",
    "conv_maxpool.reshape_and_copy_weights(weights_sw, weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 8: Load  convolution command and start CNNDataflow IP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "state: IP IDLE; Starting IP\n"
     ]
    }
   ],
   "source": [
    "# Send convolution command to CNNDataflow IP\n",
    "cmd_mem = MMIO(cmd_baseaddr, SIZE)\n",
    "cmd_mem.write(0x0, IP_cmd)\n",
    "\n",
    "# Load the number of commands and command physical address to offset addresses\n",
    "cnn.write(NUM_COMMANDS_OFFSET, 1)\n",
    "cnn.write(CMD_BASEADDR_OFFSET, cmd_baseaddr)\n",
    "\n",
    "# Start Convolution if CNNDataflow IP is in Idle state\n",
    "state = cnn.read(0x0)\n",
    "if state == 4: # Idle state\n",
    "    print(\"state: IP IDLE; Starting IP\")\n",
    "    start = cnn.write(0x0, 1) # Start IP\n",
    "    start\n",
    "else:\n",
    "    print(\"state %x: IP BUSY\" % state)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check status of the CNNDataflow IP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "state: IP DONE\n"
     ]
    }
   ],
   "source": [
    "# Check if Convolution IP is in Done state\n",
    "state = cnn.read(0x0)\n",
    "if state == 6: # Done state\n",
    "    print(\"state: IP DONE\")\n",
    "else:\n",
    "    print(\"state %x: IP BUSY\" % state)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 9: Read back first few words of OFM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0x6aeb\n",
      "0xe03\n",
      "0x11d4\n",
      "0x5c9c\n"
     ]
    }
   ],
   "source": [
    "for i in range(0, 15, 4):\n",
    "    print(hex(ofm[i]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 10: Read cycle count and efficiency of the complete run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CNNDataflow IP cycles: 44367\n",
      "Effciency: 93.47%\n"
     ]
    }
   ],
   "source": [
    "hw_cycles = cnn.read(CYCLE_COUNT_OFFSET, 4)\n",
    "efficiency = conv_maxpool.calc_efficiency(hw_cycles)\n",
    "print(\"CNNDataflow IP cycles: %d\\nEffciency: %.2f%%\" % (hw_cycles, efficiency))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reset Xlnk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Available Memory (KB): 129460\n",
      "Available Buffers: 0\n",
      "Cleared Memory!\n"
     ]
    }
   ],
   "source": [
    "mmu.xlnk_reset()\n",
    "print_kb(mmu)\n",
    "print(\"Cleared Memory!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
